Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Language learning is a complex issue of interest to linguists, computer scientists, and psychologists alike. While the different fields approach these questions at different levels of granularity, findings in one field profoundly affect how the others proceed. My dissertation examines the perceptual and linguistic generalizations regarding the units that make up words (phonemes, morphemes, and vocal quality) in Polish and English to better understand how both humans and computers formulate these concepts in language. I use computational modeling and machine learning to investigate Polish morphophonology in two ways. First, I examine consonant clusters at the beginning of Polish words to see what parameters determine human-like learnability, compared to a survey of native speakers. I run several studies to compare learning with gradient or categorical data, each at the cluster, bigram, and featural level. Second, I examine Polish yer alternation and study whether machine learning approaches can generalize morphophonological information to target this pattern when given a larger Polish. Using low level neural networks and a classification-and-regression tree (CART) decision algorithm, I examine how well they use morphological and phonological information to make generalizations that capture a small subset of the Polish vocabulary. Additionally, I conduct a psycholinguistic experiment with English speakers to further establish what level of attention listeners may give when building phonological representations. I test this by extending a previous study finding that real word primes make rejection of nonword primes more difficult, determining that the effect generalizes across speakers. This research addresses a tension in modeling the computational problem of language learning between the formalization of representation and the mechanics of the learning apparatus. Different levels of abstraction can give more sophisticated insight into the data at hand, but at a cost that may not be representative of human learning. I argue that computational linguistic questions such as these provide an interesting window into the strengths and limitations of machine learning questions as compared to the human language learning faculty. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com/en-US/products/dissertations/individuals.shtml.] ERIC # ED663172more » « less
-
How might data analytic tools support intake decisions? When faced with a request for post-conviction assistance, innocence organizations’ intake staff must determine (1) whether the applicant can be shown to be factually innocent, and (2) whether the organization has the resources to help. These difficult categorization decisions are often made with incomplete information (Weintraub, 2022). We explore data from the National Registry of Exonerations (NRE; 4/26/2023, N = 3,284 exonerations) to inform such decisions, using patterns of features associated with successful prior cases. We first reproduce Berube et al. (2023)’s latent class analysis, identifying four underlying categories across cases. We then apply a second technique to increase transparency, decision tree analysis (WEKA, Frank et al., 2013). Decision trees can decompose complex patterns of data into ordered flows of variables, with the potential to guide intermediate steps that could be tailored to the particular organization’s limitations, areas of expertise, and resources.more » « less
-
MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies.more » « less
-
null (Ed.)Minimalist grammars have been criticized for their inability to analyze successive cyclic movement and multiple wh-movement in a manner that is faithful to the Minimalist literature. Persistent features have been proposed in the literature as a potential remedy. We show that not all persistent features are alike. The persistent features involved in multiple wh-movement do not increase subregular complexity, making this phenomenon appear very natural from the perspective of MGs. The persistent features in successive-cyclic movement, on the other hand, change the subregular nature of movement, favoring an alternative treatment along the lines of Kobele (2006).more » « less
An official website of the United States government

Full Text Available